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We describe a technique for simultaneously classifying and estimating the red- 
shift of quasars. It can separate quasars from stars in arbitrary redshift ranges, 
estimate full posterior distribution functions for the redshift, and naturally in- 
corporate flux uncertainties, missing data, and multi-wavelength photometry. 
We build models of quasars in fiux-redshift space by applying the extreme de- 
convolution technique to estimate the underlying density. By integrating this 
density over redshift one can obtain quasar fiux-densities in different redshift 
ranges. This approach allows for efficient, consistent, and fast classification and 
photometric redshift estimation. This is achieved by combining the speed ob- 
tained by choosing simple analytical forms as the basis of our density model 
with the fiexibility of non-parametric models through the use of many simple 
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components with many parameters. We show that this technique is competitive 
with the best photometric quasar classification techniques — which are hmited to 
fixed, broad redshift ranges and high signal-to- noise ratio data — and with the 
best photometric redshift techniques when applied to broadband optical data. 
We demonstrate that the inclusion of UV and NIR data significantly improves 
photometric quasar-star separation and essentially resolves all of the redshift de- 
generacies for quasars inherent to the ugriz filter system, even when included 
data have a low signal-to-noise ratio. For quasars spectroscopically confirmed by 
the SDSS 84 and 97 percent of the objects with GALEX UV and UKIDSS NIR 
data have photometric redshifts within 0.1 and 0.3, respectively, of the spec- 
troscopic redshift; this amounts to about a factor of three improvement over 
ugriz-OTiXy photometric redshifts. Our code to calculate quasar probabilities and 
redshift probability distributions is publicly available. 

Subject headings: catalogs — cosmology: observations — galaxies: distances 
and redshifts — galaxies: photometry — methods: data analysis — quasars: 
general 



1. Introduction 



The last decade has seen the first instances of statistical studies with quasars us- 
ing purely photometric sam ples. Examples of these includ e the measurement of the in- 
tegrated Sachs- Wolfe effect (IGiannantonio et al.l l2006l 120081 ) and cosmic magnification bias 



(IScranton et al.ll2005al). and studies of the clustering of qua sars on large (IMyers et al.l 12006 



2007aJ) and small (iHennawi et al.ll2006al : iMyers et al.ll2007bf l scales. The importance of pho- 
tometrically classified quasar samples will only increase during the next decade as large new 
imaging surveys will uncover large samples of quasars at fainter magnitudes, with minimal 
spectroscopy for the faintest objects. While efficient photometric classification is one require- 
ment to facilitate studies of quasars without extensive spectroscopy, it has also been crucial 
to develop accurate methods for quasar redshift estimation based on broadband photometry. 
Techniques for photometric redsh ift estimation have long been successful for galaxies (e.g., 
Baumlll962t IConnoUv et al.lll995l) and became fe asible for quasars with the advent of pr ecise 
multi-filter photometry (IRichards et al.lboOlaU bl: ISudavari et al.lboOll : IWolf et al.ll2004j ). 



Closely related to the quasar photometric-redshift problem — traditionally seen as a re- 
gression problem — is the question as to how best to perform photometric classification of 
quasars. It has become clear that the best classifiers are probabilistic in nature in that they 
calculate probabilities for objects to be quasars based on accurately calibrated models for 



- 3 - 



stellar and quasar photometry (e.g., iRichards et al.l 120041 : iBovy et al.ll201ll ). These proba- 
bilities are often calculated for quasars in certain broad redshift ranges, and they therefore 
also act as low-resolution photom etric redshifts f o r the objects they classify as quasars. The 
object classification technique of ISuchkov et al.l (120051 ) uses bins of width A2; = 0.2 and, 
thus, achieves classification with a finer photometric-redshift estimate. However, detailed 
photometric redshift estimates for photometrically classified quasars utilize heterogeneous 
techniques, such that the resulting redshift probability distributions are inconsistent with 
the broad probabilities used for the init ial quasar classi f ication. For i nstance, this is the case 
for the photometric quasar catalogs of IRichards et al.l (120041 . l2009al ). For these catalogs, a 
non-parametric kernel-density-estimation (KDE) technique that ignores photometric uncer- 
tainties was used to classify quasars, while a parametric model that convolves the quasar 
color locus with the photometric uncertainties — a sin gle Gaussian distribut ion in bins of 
redshift A2; ^ 0.075 — was applied to estimate redshift ( IWeinstein et al.ll2004[ ). 



For many purposes, one would like to target quasars in arbitrary redshift ranges that 
differ from those predetermined and imposed by a broad c lassification method. For ex- 
ample, the Baryon Oscillation Spectroscopic Survey (BOSS: lEisenstein et al.l |20 111 ) of The 
Sloan Digital Sky Survey III (SDSS-III) aims to measure the ba ryon acoustic feature in 



the Lya forest of medium-red shift (2.2 ^ z < 4.0) quasars (e.g., iMcDonald &: Eisenstein 
20071 : iMcQuinn fc Whitell201ll ). The spectral range accessible to the BOSS spectrographs 
is 3600 < A < 10000 A (Barkhouser et al., 2011, in preparation), thus BOSS can only study 
the Lya forest as traced by redshift z > 2.2 quasars. Therefore, BOSS requires quasars to be 
targeted based on their probability to be at redshift > 2 . 2, and the BOSS quasar classifiers 
were trained with this constraint (e.g., iRoss et al.ll201ll : iBovy et al.l |20 111 ). However, other 



ground-based instruments can observe at shorter wavelengths, e.g., the Multi-Object Dou- 
ble Spectrograph for the Large Binocular T elescope, which can observe the spectral range 
3400 A < A < 10000 A jPogge et allboiol ). This instrument could study the Lya forest 
starting at redshift z >2. An Lya forest experiment designed for the Large Binocular Tele- 
scope might therefore target quasars in the redshift range 2.0 < z < 2.2 in addition to those 
at higher redshift. 

Another example of a project that requi res accurate photomet r ic characterization 



of quasars is the search for binary quasars (IHennawi et al.l l2006al : iMyers et al.l 12008 



Hennawi et al.ll2010l : IShen et al.ll2010l ). where the key metric is the probability that two ob- 
jects are both quasars anc? proximate in redshift, i.e., the joint (or "overlapping") probability 
that both components of a pair of objects are quasars of a particu lar redshift. Similarly , 



the search for pro j ected quasar pairs for abso r ption line studies jHennawi et al 



Bowen et al. 2006: Hennawi fc Prochaska 2007: Prochaska fc Hennawi 



2006b 



2OO9I ) requires the 



joint probability that both objects in a projected pair are quasars. Such calculations require 
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a full model of quasar probabilities and redshifts. Ideally, therefore, photometric redshift 
estimation and quasar classification ought to be performed together. 



In the specific case of objects observed using the ugriz filter system (IFukugita et al. 



19961 ). quasar photometric redshifts are plagued by a host of degeneracies at redshi fts where 
various quasar emission lines are mistaken for the Lya line (IRichards et al.l 120021 ): this re- 
sults in "catastrophic" redshift failures, although consideration of the full redshift posterior 
distribution function (PDF) shows that most of these failures have a significant integrated 
probability around the correct redshift (e.g., iBall et al.l l2008t see below). The addition 
of non-ugriz data, e.g., ultraviolet (UV) and near- infrared (NIR) measurements, can both 
alleviate these redshift degeneracies and improve quasar-star separation. Quasar classifi- 
cation and characterization in the infrared has been cons idered for sinaulated objects and 



for q u asar samples with a range of depths and areas fe.g.. IWarren et al. 



lorqi 
200ll: 



Francis et al. 



2004; 



Glikman et al. 



2006 



Maddox fc Hewett 



2006 



20001: ICroom et al 



Chiu et al. 



200 



Richards et al. l2009b: D'Abrusco et al. 



2009 



Assef et al. 



2010 



Wu fc Jia 



2010 



Peth et al. 



20111). The NIR is al so the region to search for the highest redshift quasars (redshift z > 6; 
Mortlock et al.ll201ll ). These studies show the great promise that NIR data hold for quasar 
selection and redshift estirnation. The UV holds a similar potential (see, e.g. , lAtlee fc Gould 
200?! : iTrammell etaPboOTl : I Jimenez et al]l2009t iHutchings fc Bianchi|[201ol ). 



The technique we introduce in this article, which we denote XDQSOz, is the first that 
deals with the simultaneous classification of quasars and assignatio n of quasar redshi fts. 
This technique extends the XDQSO quasar classification technique of iBovy et al.l ( 120111 ) to 
model the density of quasars in color-redshift space with a flexible semi-parametric model 
consisting of a large set of Gaussian component distributions. This model can be integrated 
analytically over any redshift range to calculate probabilities from flux measurements for 
individual objects. This, in turn, allows quasar probabilities to be calculated over any redshift 
range. Thus, a probability distribution in redshift space (a "PDF") is a natural componen t of 



the model. Because we use the extreme deconvolution (XD) technique (iBovy et al. 



20091) as 



our density estimation tool, the method can be trained on and applied to low signal-to-noise 
ratio data, even with missing values, e.g., to objects missing measurements in any arbitrary 
collection of filters. This feature allows us to naturally include UV and NIR broadband 
fluxes, where sky coverages differ, as part of our model space and to distinguish sources that 
are missing data in a particular band from objects that are dropping out of that band. We 
show that the addition of UV and NIR broadband fluxes improves quasar-star separation 
significantly and that it essentially breaks all of the redshift degeneracies inherent to the 
ugriz filter set. 



This article is organized as follows. In Section [2l we discuss general aspects of photomet- 
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ric redshift estimation and classification in the context of quasars. We briefiy describe the 
data used to train and test the new method in Section [31 Section H] contains a full descrip- 
tion of the XDQSOz quasar model and Section O shows how this model is used to calculate 
quasar probabilities over arbitrary redshift ranges. Section E] assesses the performance of the 
photometric redshifts obtained using the XDQSOz model. A discussion of various extensions 
of the model is given in Section [7] and we conclude in Section |8l The Appendix describes 
the photometric classification and redshift estimation XDQSOz code that is made pubhcly 
available. 



In what follows, AB magnitudes ( lOke fc Gunnlll983l ) are used throughout. Where dered- 
dened fiuxes and magnitudes are required we have used the reddening maps of ISchlegel et al. 
( 119981 ) . All magnitudes and fiuxes should be considered as dereddened unless mentioned oth- 
erwise. 



2. General considerations 

A technique for photometric redshift estimation of quasars should have the following 
properties. 



It should provide full probabihty distributions for the redshift of the quasar based on its 
observed photome try, because this information has particular utility for quasars (e.g., 
Myers et al.ll2009l ) as near-degeneraci es in redshift estimation from broadband pho tom- 



etry are ubiquitous for quasars (e.g.. [Richards et al.ll2001bl : iBudavari et al.ll200ll ). 



Upon the evaluation of the probability of the redshift the photometric uncertainties 
should be treated properly to allow photometric redshift estimation for faint objects. 

If based on an empirical training set, the technique should be able to be trained on 
low signal-to-noise ratio data with potentially missing data. The training set and the 
evaluation set should also be allowed to have different noise properties, e.g., different 
distributions of signal-to-noise ratio. For example, while the optical fluxes are mostly 
well measured for a spectroscopic training sample, the addition of UV and NIR data 
can help break redshift degeneracies (see below), but these measurements often have 
low signal-to-noise ratio, even for the training set. 



The technique should allow an explicit redshift prior to be specified. 



The key to photometric classification and redshift estimation for quasars based on broad- 
band fiuxes is the joint probability of an object's fiuxes, its redshift, and the proposition that 
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it is a quasar p(flux, 2;, quasar). This joint probability can be re- written in several ways that 
correspond to different ways of approaching the problem 

p(fluxes, quasar) = p({iuxes\z, quasar) p(2;|quasar) P(quasar) (1) 
= ^(fluxes, 2;|quasar) P(quasar) (2) 
= p(z|fluxes, quasar) p (fluxes | quasar) P(quasar) . (3) 

Photometric redshift estimation corresponds to the probability of an object's redshift con- 
ditioned on its fluxes and assuming that it is a quasar: 

, 1^ , pffluxes, 2;, quasar) 

p(2;|nuxes, quasar) = — — — . (4) 

]9(fluxes, quasar) 

Quasar classification is the probability that an object is a quasar based on its fluxes. To 
classify quasars in a certain redshift range Az, we integrate the joint probability that the 
object is a quasar with redshift z over redshift: 

P(quasar in A2;|fluxes) = / dzp(quasar, z|fluxes) (5) 

Jaz 

, pfquasar, z, fluxes) , . 

m ^ ^ 

p(fluxes) 

The probability that an object is a quasar of any redshift is obtained by setting the redshift 
range Az = [0, 00]. The normalization factor p(fluxes) in this equation is given by 

p(fluxes) = p(fluxes, quasar) + p(fluxes, not a quasar) . (7) 

The probability of an object not being a quasar can be obtained e mpirically by modeling 



the fluxes of non-quasars (see iRichards et al.l 12004 : iBovy et al.ll201ll ). 



The discussion above suggests that a unified approach to classification and photometric 
redshift estimation is possible. Because the method described in this article is the first tech- 
nique in this class, we briefly discuss previous attempts at photometric redshift estimation 
and how they fit in the framework outlined in this section. 



The /c-nearest neighbors approach of iBall et al.l (120071 ) is an instance-based machine 



learning technique that compares the colors of test objects to the k nearest objects in color- 
space in a training set, and assigns a weighted combination of the redshifts of those nearest 
neighbors to the test object. Its generalization to take observational flux-uncertainties into 
account in volves perturbin g both the test and the training data within their Gaussian noise 



ellipsoids ( IBall et al.ll2008[ ). In its noiseless implementation the method does not return a 



full probability distribution for the redshift. When taking the photometric uncertainties into 
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account the technique essentially returns samples from flux, quasar) as in equation ([3]), 
which can be binned to obtain the full posterior distribution function. While the photometric 
uncertainties of the test objects are handled correctly, the approach for dealing with the 
uncertainties of the training data effectively convolves with the uncertainties twice, as it adds 
scatter to the training data that are already scattered from the intrinsic distribution due to 
photometric noise. Because the technique directly uses the training set, it also implicitly 
applies a redshift prior that approaches the observed redshift distribution. This choice of 
prior does not reflect the int rinsic redshift distrib ution, as the observed distribution is shaped 
by various selection effects ([Richards et al.ll2006l ). 



The approach taken by lHennawi et al.l (120101 ) consists of fltting the relative-flux-redshift 
distribution and its scatter to produce the likelihood of the quasar redshift as in equation ([T]). 
This flt is conducted without taking the flux uncertainties into account, but upon evaluation 
of test objects the flux uncertainties are fully handled. 



Closest to the approach taken in this article is the technique of IWeinstein et al.l (120041 ). 
The distribution of colors in a set of narrow bins in redshift is flt as a single multi-variate 
Gaussian distribution. This approach is similar to quasar classiflcation approaches where 
the color or relative-flux distributi ons of quasars are fit in much broad er redshift ranges 
using more general density models (IRichards et al.l 12004 . iBovy et al.ll201ll ). IWeinstein et al. 



(|20041) do not use the photometric uncertainties of the data they use for training. But, as in 
Hennawi et al.l (120101 ). photometric uncertainties for test objects are fully taken into account. 



All of the techniques described above could be extended to allow quasar classification by 
specifying the necessary factors of P(quasar) or p(fiuxes, quasar) in equatio ns ([TD~(l3D. Th e 



latter could be take n from a quasar c lassification scheme such as NBC-KDE ( iRichards et al. 



2009af ) or XDQSO (IBovy et al.ll201ll ). although care should be taken that the classification 
method uses the same redshift prior as the photometric redshift technique for consistency. 

The XDQSOz technique introduced in this article uses equation ([2]) as the basis of 
both quasar classification and photometric redshift estimation. Specifically, we model the 
relative-fiux-redshift distribution using a large numbe r of Gaussians by deconvolving this 
distribution for a training set using the XD technique (iBovy et al.l 120091 ). We use empirical 
relative fiuxes that are re- weighted using an explicit, magnitude-dependent redshift prior 
(which can easily be divided out). As described in Section [5l conditioning on the fiuxes to 
obtain full photometric probability distributions for the redshift, and marginalization over 
redshift to classify quasars, is simple and fast in this approach. Because we deconvolve the 
relative-fiux-redshift distribution when training, we can straightforwardly incorporate UV 
and NIR data, both of which significantly improve the accuracy and precision of the inferred 
redshifts. 
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3. Training data 



3.1. Optical data from the Sloan Digital Sky Survey 



The Sloan Digital Sky Survey {SDSS; lYork et al.l |2000| ) has obtai ned u,q,r,i and z 
CCD imaging of ^ 10^ deg^ of the north ern and sout hern Galactic sky (Gunn et al.lll998 



Stoughton et al.ll2002l : iGunn et al.l 120061 ) . SDSS-III flEisenstein et al.l 120111) has exte nded 
this area by approximately 2,500 deg^ in th e southern Galac tic cap ( lAihara et al.ll201ll ). All 
the data processing, including astrometr y ( Pier et al. 20031), source identification, deblend- 
ing and photometry (Lupton et al. 2001), and calibration (jPukugita et al. 1996 : Hogg et al. 



200 ll : ISmith et all l2002[ llvezic et al.l I200I IPadmanabhan et al.l hoO^ are performed with 



automated SDSS software. SDSS DR7 imaging observations were obtained over the period 
2000 March to 2007 July. 

The SDSS training data used here are essentia lly the sam e as th e data used to train 
the XDQSO method; they are described in detail in iBovy et al.l (120 111 ). 



We use a sample of 103, 601 spectroscopically-confirmed redshift z > 0.3 quasars from the 
SDSS DR7 quasar catalog (IRichards et al.ll2002l : ISchneider et al.ll2010l ). We use all of these 
quasars to essentially model the color-redshift relation for quasars (but see below for the de- 
tailed description of our method). We combine the color-redshift relation with an apparent- 
magnitude dependent redshift prior obtained by integratin g a model for the quasar luminos 



ity fu nction over the apparent-magnitude range of interest (IHopkins. Richards, fc Hernquist 



20071 ) . This prior for a few bins in apparent magnitude is shown i n Figure H] also shown 
is the difference betwe en the iHopkins. Richards, fc HernquistI (120071 ) redshift prior and the 
prior derived from the iRichards et al.l (l2006l ) luminosity function. As the sample of quasars 
from the SDSS DR7 quasar catalog spans a wide range in luminosity that we apply to a 
narrow range in apparent magnitude and that we extrapolate to largely unexplored faint 
flux l evels, we are ignoring correlati ons between quasar spectral properties and luminosity 



[e.g.. [Baldwin lll977l : lYip et al.ll2004j ). These correlations mostly affect emission line shapes, 
such that they are washed out in broadband colors, especially compared to the intrinsic 
color-scatter. 



3.2. UV data from the Galaxy Evolution Explorer 



In addition to ugriz optical d ata, we use UV da ta obtained by the Galaxy Evolution 
Explorer space mission {GALEX; iMartin et al.ll2005l ). GALEX has performed an all-sky 
imaging survey in two UV bands (FUV: 1350 to 1750 A; NUV: 1750 to 2750 A) down to 
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mAR ^ 20.5 and a medium-deep imaging survey that reaches itlab ~ 23 (e.g., iBianchi et aL 



201ll ). Some of the data used below to test the technique described in this article comes 



from the medium-deep survey, while much of the data used to train our technique comes 
from the shallower all-sky survey (because our training sample of quasars is drawn from the 
full ~ 10, 000 deg^ SDSS footprint), but this difference is largely offset by the fact that our 
training set is brighter than the faint part of the test set. GALEX GR5 observations were 
obtained between April 2003 and February 2009. 



Rather than using GALEX catalog products ( jMorrissey et al.l 120071 ) we use measure- 
ments of the UV fluxes obtained b y force-photometer ing GALEX images (from GALEX Data 
Release 5) at the SDSS centroids ( lAihara et al.ll201ll ). such that we obtain low signal-to- noise 
PSF fluxes of objects not detected by GALEX. As we show below, these low signal-to-noise 
ratio observations are essential for better classification of redshift z > 2 quasars. We expect 
these measurements to be released as part of SDSS Data Release 9, scheduled for 2012. The 
top panel of Figure |2] shows the distribution of signal-to-noise ratio for SDSS quasars in the 
GALEX footprint. A total of 62,661 objects lie in the GALEX FUV footprint, 63,372 lie in 
the NUV footprint, and 62,628 are covered by both bandpasses. 



3.3. NIR data from the UKIRT Infrared Deep Sky Survey 



We also use NIR data to improve quasar classification and photo metric redshift estima- 
tion. The UKIRT Infrared Deep Sky Survey ( UKIDSS) is defined in iLawrence et al.l (j2007| ) 
and consists of five survey components with different wavebands, depths and footprints. For 
the study in this paper we use data from the UKIDSS Large Area S urvey (LAS). Tec hnical 
details about the UKIDSS LAS observing strategy are described in iDye et al.l ( l2006l ). 



The UKIDSS LAS aims to cover 4,000 deg^ of the SDSS footprint in the Y, J, H and K 
wavebands. In this paper we use data from UKIDSS LAS DR7 which includes observations 
obtained between May 2005 and July 2009 inclusive. The UKIDSS LAS DR7 overlaps 
the SDSS imaging footprint over ^ 2500 deg^ and has median point source 5-sigma AB 
magnitude limits in Y, J, H and K of 20.9, 20.6, 20.2, and 20.2, respectively. 

The UKIDSS data are acquired with the UKIRT Wide Field Camera (WFCAM ; 
Casali et al.l 120071 ). The UKIDSS p hotometric syste r n is d escribed in iHewett et al.l (120061 ) . 
and the calibration is described in iHodgkin et aP J2009h . The pipeline p rocessing and 



scienc e archive are described in M. J. Irwin et al. (2012, in preparation) and iHambly et al. 
fl2008h . 



As in the case of the GALEX data described in Section we use force-photometered 
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NIR fluxes at SDSS positions rather than UKIDSS catalog data. This "list- driven" infor- 
mation is derived from aperture photometry on Data Release 7 of the UKIDSS LAS. We 
choose an aperture radius of 1 arcsec0. Of the 103,601 quasars in the SDSS DR7 quasar 
sample, 29,726 lie within the UKIDSS DR7 K-band footprint. Approximately 22,000 of 
these quasars are detected in the K band with also overlapping coverage in all four observed 
wavebands (Y, J, H and K) in the UKIDSS LAS DR7 source catalog. The bottom panel of 
Figure [2] shows the distribution of signal-to-noise ratio in the NIR for quasars in our training 
sample. Unlike for SDSS imaging, measurements in all four filters of the UKIDSS survey 
are not obtained during the same observing run — H and K observations are performed in 
the same observing block, while Y and J are conducted separately. Thus the sky coverage 
in different UKIDSS bands varies, and many objects are missing data in one or more of the 
four bands. The breakdown of training quasars with observations in the NIR by bandpass 
is: Y: 26,876; J: 27,328; H: 28,911; K:29,726. A total of 25,510 objects have measurements 
in all four bandpasses. 

The differing epochs of the UKIDSS, GALEX, and SDSS can range up to 9 years in 
the observed frame. For a quasar with a redshift of 2 this is 3 years in the rest frame. The 
observed optical variability in radio quiet quasars over the rest frame 2 to 5 year timescale is 



obser ve d to be in the range .1 - 0. 2 mag and is a function of absolute magnitude (IHook et al. 



1994 ). IVanden Berk et al.l (12004 ) find that the variability amplitude decreases with rest- 
frame wavelength by a factor of two betwe en 1500A and 6000A with an amplitude of ~ 0.15 
mag at 6OOOA (see also lWelsh et al.ll201l[ ). 



Kozlowski et al.l ( 1201 Obi ) have studied the mid-/i? variabihty using multi-epoch Spitzer 
observations of a sample of ~ 1000 active galactic nuclei and find that the rest-frame J band 
variability amplitude in the rest-frame timescale is ~ 0.1 mag. In summary the quasars in this 
study are expected to vary by ~ 0.3 and 0.1 magnitudes in the UV and NIR, respectively, 
over the elapsed period of the observations and this is less than the average photometric 
errors in the individual wavebands and significantly less than the range in colors. 



4. Flux— redshift density model 



The photome tric redshift technique XDQSOz is an adaptation of the XDQSO technique 
(IBovy et al.l 1201 11 ) to include redshift explicitly in the model for the quasar population. 
XDQSOz achieves this by modeling the p(fiux, 2;|quasar) factor in equation ([2]), where the 



^For a further description of the UKIDSS data processing by the Cambridge Astronomy Survey Unit see 
|http : / / casu . ast . cam. ac .uk/ surveys-projects/wf cain/technical/catalogue-generation| . 
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XDQSO technique modeled p(flux|quasar in A2;) in three bins in redshift (corresponding to 
low-, medium-, and high- redshift quasars). As discussed in Section [2], this approach allows us 
to obtain full posterior distribution functions for the redshift of a photometrically classified 
quasar based on its broadband fluxes. Integrating this redshift probability distribution over a 
range of redshifts and properly normalizing this result using equation ([7]) gives a photometric 
quasar probability in the chosen redshift range that is, as we show below, competitive with 
the best available photometric quasar classification techniques, e.g., XDQSO. 

To estimate t he density of quasars in fiux-redshift space we use extreme deconvolutioT^ 
(IBovy et al.ll2009[ ). As described in Section |3l our training set consists of the SDSS DR7 
quasar sample, which consists mostly of bright, viz., dereddened i < 19.1 mag (i < 20.2 
mag for z > 3 sources), objects with small photometric uncertainties. The GALEX and 
UKIDSS data described in Sections 13.21 and 1 3.31 are much shallower than the SDSS data and 
many objects are not detected at high significance in these surveys, such that photometric 
uncertainties are not insignificant (see Figure |2]). Additional complications are that these 
two supplemental surveys have not observed the full SDSS footprint and that the UKIDSS 
LAS footprint is different for the different NIR bands, such that we have heterogeneous 
missing data and heteroscedastic uncertainties. XD is uniquely suited to deal with these 
complications in the proper probabilistic manner. XD assumes that the flux uncertainties 
are known and t hat they are close to Gaussian, as is the case for PSF flu xes for point- 
sources in SDSS (jivezic et al.l l2003l : IScranton et al.l l2005bl : llvezic et al.l 120071 ) . We assume 
that the sp ectroscopic redshifts h ave vanishing uncertainties because their typical value of 
az ~ 0.004 (jSchneider et al.ll2010l ) is orders of magnitude smaller than typical uncertainties 
in broadband photometric redshifts, which are set by the width of the quasar locus. 

XD models the underlying, deconvolved distribution as a sum of K (i-dimensional Gaus- 
sian distributions, where is a free parameter that is set using an external objective (see 
Section H?T]) . XD consists of a fast and robust algorithm to estimate the best-fit parameters 
of the Gaussian mixture. 



4.1. Construction of the quasar flux— redshift model 

The full quasar- density model is constructed by fitting the flux-redshift density of 
quasars in a number of bins in the i-band magnitude. As we use the same set of quasars in 
each bin with a different redshift prior — see the discussion in Section [3] — we could instead 
have fit a single bin, e.g., the brightest. The other bins could have been constructed by 



^Code available at jhttp : / / code . google . com/p/ extreme-deconvolution/ 
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dividing out the redshift prior of the first bin and muhiplying in the redshift priors for the 
fainter bins. However, as we will show below, the advantage of a Gaussian representation 
of the fiux-redshift density is that it allows integrals of this density over arbitrary redshift 
ranges to be calculated analytically. This leads to fast quasar-probability estimation. If we 
were instead to divide out the redshift prior and multiply in a different redshift prior, the 
resulting function would no longer be Gaussian and the numerical integration over redshift 
would be much more computationally expensiv^. Because our short-term objective is to 
run this algorithm on essentially all of the ~ 10^ SPSS point sources and in the future 
on the ~ 15 PB of LSST catalog data (lAbell et al.l 120091 ). this computational advantage is 
important. After fitting the first bin, all other fits are initialized using the previous bin's 
optimal solution; these extra fits all converge very quickly as the quasar fiux-redshift density 
does not vary strongly with apparent magnitude. The redshift prior is shown for a few bins 
in apparent magnitude in Figure [H If a different redshift prior is desired, one can divide 
out this prior and multiply in a new prior (these priors are included in the code rele a se de- 
scribed in the Appendix). For example, if one would prefer to use the lRichards et al.l (120061 ) 
model for the quasar luminosity function, one would multiply the posterior distribution func- 
tion for the redshift obtained using the fiducial iHopkins. Richards, fc HernquistI (120071 ) prior 
with the factor shown in the bottom pane l of Figure [H This panel shows t he rat io of the 
Richards et al.l (120061 ) redshift prior to the iHopkins. Richards. &: HernquistI (120071 ) prior in 
a number of apparent-magnitude bin. It is clear that there is only a significant difference 
at relatively large redshift and at faint magnitudes, where constraints on the luminosity 
function are sparse. 

As for the XDQSO technique, we divide the quasar-density model into a factor describ- 
ing, essentially, the color-redshift density of quasars — but we again use relative fluxes rather 
than colors — and another factor describing the apparent-magnitude distribution of quasars. 
We adopted this approach because the flux density of quasars has a dominant power-law 
shape corresponding to the number counts as a function of apparent magnitude, while the 
color distribution is much flatter. We write 



p(fluxes, 2;|quasar) = p(fluxes relative to i, 2: | quasar) p(z-band flux|quasar) . (8) 

The apparent-magnitude factor does not depend on redshift. So, this factor is the same as 
used in the XDQSO method. The factor is calculated by the sum of the apparent-magnitude 



•^Alternatively, we could have modeled the density using a uniform prior over redshift and modeled 
the magnitude-dependent redshift prior as a polynomial or another mixture of Gaussians. Integrating a 
polynomial or mixture of Gaussians times a mixture of Gaussians could also be performed analytically and 
fast. 
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priors in Figure 1 of iBovy et al.l ( 120111 ) weighted by the quasar densities in Table 1 of the 
same article. As quasar redshifts are always positive, we model the logarithm of the redshift. 
Since our training sample consists of point-like objects at redshift z > 0.3, about all at 
z < 5.5, our model should only be trusted to return reasonable densities within this range. 

In each bin we model the c?- dimensional relative-flux-redshift density, where d is the 
number of independent colors plus one (for redshift), of quasars using 60 Gaussians that 
are allowed to have arbitrary means, variance matrices, and amplitudes — the amplitudes are 
constrained to sum to one. We use the full set of 103,601, z > 0.3 quasars to train the final 
model, but in order to test whether we are under- or overfitting the data we performed a 
cross-validation test. To cross-vahdate we extract a random subset containing 10 percent of 
the full sample to use as an independent test data set. By training the model on the remaining 
90 percent of the sample we can select the number of Gaussians that optimally predicts — i.e., 
predicts with the highest probability — the redshifts of objects in the test sample. The results 
from this procedure are shown in Figure [31 Our ability to better predict the test redshifts 
saturates around 50; we chose 60 Gaussians to represent the relative- flux density of 

quasars. Compared to the XDQSO method, which used 20 Gaussians each in three redshift 
bins, this revised approach uses the same number of Gaussians while representing an extra 
dimension (redshift). One might be concerned that because the Gaussians will preferentially 
be found in high-density, viz., low-redshift, regions, the density of medium- and high-redshift 
quasars is not adequately described in the XDQSOz model. We will see below that this is 
not the case and that XDQSOz performs similarly to XDQSO in selecting medium- and 
high-redshift quasars. 



The full model consists of 47 bins of width 0.2 mag between i = 17.7 and i = 22.5, 
spaced 0.1 mag apart (adjacent bins overlap). As described above, the XD fits for all but 
the brightest bin are initialized using the best-fit parameters for the previous bin. Each bin 
uses the full set of 103,601 redshift z > 0.3 quasars. 

In each of 47 bins we fit 60 ra-dimensional Gaussians, yielding a total of 47 x (60 x 
[1 -|- d -|- d{d + l)/2] — 1) parameters. The ugriz-onlj model has 59,173 parameters, the 
model that also uses the two UV bands has 101,473 parameters, the model that adds the 
four NIR bands to the optical fiuxes has 155,053 parameters, and the full UV-ugriz-NlIi 
11-dimensional model has 219,913 parameters. To obtain the total number of parameters for 
photometrically classifying quasars using XDQSOz, we need to add the number of param- 
eters describing the stellar relative-fiux density in 47 bins — from the XDQSO method — to 
this number, amounting to 14,053, 26,273, 42,253, and 61,993 parameters for the ugriz, 
ugriz+UV, ugriz+NIR, and ugriz+UV+NIR models, respectively. Models including UV 
or NIR data are trained using any available data, i.e., any object with a measured fiux in 
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any of the bandpasses is included in the training set. 



4.2. Comparison of the model and observations 



In this section, we assess the performance of the XD technique for modeling the relative- 
flux-redshift distribution of quasars, and provide examples which demonstrate that the XD 
technique produces excellent fits to the data. We demonstrate that the XD method does 
an excellent job of empirically calibrating the color-redshift relation; the ability of the XD 
technique to model the relative- flux density of quasars in the current XDQSOz context is 
excellent as well, but it is very simil ar to the performance in the XDQSO context and we 
refer the reader to iBovy et al.l ( 1201 ll ) for a discussion of this performance. 



Figure H] shows relative-fiux-redshift and color-redshift diagrams of quasars for a single 
i-band magnitude bin. The conditional distribution of relative-fiux as a function of redshift 
is shown here (although the model contains a full model of the density on this manifold); 
this emphasizes what is new in XDQSOz as compared to XDQSO. We see that the XD 
technique is superb at capturing the complexity of the quasar color locus, even at higher 
redshifts where the data are sparse and noisy. The locations where prominent emission lines 
cross the relevant SDSS filters are indicated, and it is clear that this drives much of the 
structure in the color-redshift relation. 

Figure [5] shows similar relative-fiux-redshift diagrams for the UV fluxes in the model 
containing both optical and UV data. The agreement between the empirical model and the 
data is excellent. These diagrams clearly demonstrate that the UV flux of 2 > 1 and z > 2.3 
quasars, for FUV and NUV respectively, is suppressed because of absorption below the 
Lyman limit (A912 A) by interven ing systems ( Miller Sz Jakobsenlll990l : IPicard Sz Jakobsen 
19931 : IWorseck fc Prochaskal l201ll ). UV observations are an excellent tool to distinguish 
z ^ 0.8 quasars from z ^ 2.3 quasar s, which have de generate ugriz colors and plague 
medium-redshift quasar selection (e.g., iRoss et al.l 1201 ll ). even at low UV signal-to-noise 
ratio. 

Figure [6] presents relative-fiux-redshift diagrams for the four NIR fluxes in the XDQSOz 
model that contains optical and NIR data. The agreement between the XDQSOz model and 
the data is again excellent and the XDQSOz model captures all of the photometric redshift 
information contained in the NIR. We see that much of the variation in the color-redshift 



relation in the N IR is driven by the Ha line (see also lGlikman et al.ll2006l : lAssef et al.ll2010 
Peth et al.l[201ll ). 



The model-data comparisons given in this section are only a small fraction of the model- 
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assessment diagnostics that we performed. For example, we do not show the optical fits here 
in models that contain UV or NIR data, nor do we show the UV models and NIR models 
in the full ugriz-UV+NlK model, as all of these comparisons are very similar to the ones 
shown here. 



5. Targeting and photometric quasar classification with XDQSOz 

We can use the XDQSOz model to photometrically classify and target quasars by cal- 
culating the probability that an object is a quasar based on its broadband fluxes. The 
probability that an object is a quasar in a redshift range Az is obtained by integrating the 
probability that an object is a redshift z quasar over redshift. We start by using equation ([5]) 

p{z, quasar I fluxes) oc p{z, {fj/ fi}\fi, quasar) quasar) , (9) 

where {fj/ fi} is the set of fluxes relative to fi and fi is the i-band flux of the object. The 
normalization factor is given by 

POO 

p(fluxes) = p(fluxes, star) + / d2;p(z, fluxes, quasar) . (10) 

Jo 

Because the apparent magnitude factor p(/j, quasar) does not depend on redshift, the integral 
over redshift is only over the p{z, {fj/ fi}\fi, quasar) factor, which is modeled as a simple sum 
of Gaussian distributions. 

For any given object we can simplify the mixture of n-dimensional Gaussian distributions 
to a mixture of one-dimensional Gaussian distributions for the redshift of the object. First, 
we find the bin in the i-band magnitude that best matches the object's i-band magnitude 
and use the mixture-of-Gaussians representation of the relative-flux-redshift density in this 
bin. Assuming that the n-dimensional mixture of Gaussians has amplitudes a^, means m^, 
and variance matrices V^, we can condition each of the components on the measured relative 
fl ux r = l/i / fil of t he object and its uncertainty variance matrix S to find (e.g.. Appendix B 



of IBovv et al.1 120091 ) 



= + Vt T;;-^- (r - m,^) (11) 

2 jT-k rp— 1,A; -i7-T,fc (^')\ 

^ z,k ' zz zr rr zr \ J 

while the amplitudes of these one-dimensional Gaussian distributions are given by the pos- 
terior probability that the object was drawn from component k 

a,A/-(r|m,^T,^,) 

E.«,Ar(r|mi,TiJ- ^''^ 
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In these expressions and are the relative flux and the redshift parts of irifc, respectively; 
T^^ = V^^ + S; V^^, V^^, and are the relative-flux-relative-flux, redshift-relative-flux, 
and redshift-redshift parts of Yk, respectively; and Af{-\-, ■) is the multivariate Gaussian dis- 
tribution. T^^ includes the uncertainty variance matrix S because the necessary uncertainty 
convolution simply reduces to adding the observational uncertainty variance matrix to the 
intrinsic variance matrix for each Gaussian component. 

Integrating this one-dimensional mixture of Gaussian distributions over an arbitrary 
redshift range results in a sum over error functions. Remembering that our model lives in 
log redshift space 



dzp{z, {fj/fi}\fi, quasar) = quasar) 



X J]^(erf 

k 



V2 



0'z,k 



— erf 



log ^^min - m^, 
V^Orz,k 

(14) 



where the error function erf[x] = 2 e *^dt /a/vt- The first factor on the right-hand side of 
this equation is the integral over the entire redshift range [0, oo], which simplifies to 

POO 

quasar) = / dzp{z,{fj/ fi}\fi, quasar) 

(15) 

= J]a,A/-(r|m,^Ty 



i.e., the denominator in equation f[T3|) . 

We can compare the quasar probabilities obtained by integrating the XDQSOz model 
over redshift to those from the XDQSO technique, which models the distribution of quasar 
fluxes in three wide redshift bins. Figure [7] shows the probabihties that 490,793 objects 
are medium-redshift (2.2 < z < 4.0) quasars obtained by the two methods for objects in 
the SDSS imaging stripe 82. It is clear that most of the objects cluster tightly around the 
one-to-one line and that the two models are essentially the same for this redshift range. 

Figure [8] shows the efficiency of quasar targeting using both the XDQSO and the 
XDQSOz method for targeting medium-redshift (2.2 < z < 4.0) quasars. This test uses 
a sample of medium-redshift quasars spectroscopically confirmed by BOSS — which also re- 
targets quasars previously identified in earlier surveys — in stripe 82. This quasar sample is 
expected to be highly complete, because it was targeted using the superior im aging in stripe 



82 where there is variability information (jPalanque-Delabrouille et al.l l201ll ) and where a 



number of campaigns prior to BOSS have also obtained extensive spectroscopy. The sample 
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has on average of 30 z > 2.2 quasars deg ^ down to g ^ 22 mag, which is close to the num- 



ber e xpected from current quasar luminosity functions (e.g. jHopkins. Richards. &: Hernquist 
20071). We only u se re gions of stripe 82 that have more than 15 z > 2.2 quasars deg~^. See 
Ross et al.l (120 111 ) and lBovy et al.l (120 111 ) for a more detailed description of the BOSS quasar 
target selection in general and this test set in particular. 

The top panel of Figure [8] shows selection based on SDSS ugriz fluxes alone. We see 
that the performance of the XDQSOz and the XDQSO techniques is essentially identical. 
The lower panels of this figure show how the selection improves when we add GALEX UV 
and UKIDSS LAS NIR observations, both of which are available for essentially all objects in 
SDSS stripe 82. The XDQSO models with UV and NIR data are models trained with these 
UV and NIR fluxes in the broad redshift ranges used by XDQSO. For all intended purposes 
the XDQSOz technique performs identically to the XDQSO technique for targeting medium- 
redshift quasars. 

We have also checked the performance of the XDQSOz technique as c ompared to the 



kernel- densit y-est imat ion based photometric quasar classification technique of lRichards et al. 



(12004 . l2009al) . We find result s that are similar to those for the XDQSO technique as shown in 



Table 3 of iBovy et al.l (|201ll ): at low and medium redshift the XDQSOz technique performs 
slightly better than the XDQSO technique (and thus better than the KDE technique), while 
at high-redshift [z > 3.5) XDQSOz performs slightly worse than XDQSO. This behavior is 
expected because the quasar training data do not include much data at high redshift. We 
thus do not probe the color-redshift relation at high redshift as well as the KDE approach, 
which included additional high-redshift data. Because we use the same stellar model as 
XDQSO, the same problem with sampling regions of low stellar density that we encountered 
for XDQSO persists for XDQSOz. 

In summary, the XDQSOz technique performs almost identically to the XDQSO method 
for photometrically classifying objects as quasars — and thus for quasar targeting. XDQSOz 
has the advantage over XDQSO and any other photometric quasar classification scheme that 
it can classify quasars in arbitrary redshift ranges "on the fly" (i.e., without retraining the 
model). 

We have computed XDQSOz quasar probabihties for all 160,904,060 point sources with 
dereddened i-band magnitude b etween 17.75 and 22.45 mag in the 14,555 deg^ of imag- 
ing from SDSS Data Release 8 (lAihara et al.ll201ll ) in three redshift ranges (0.3 < z < 2, 
2 < z < 3, and 2; > 3). Figure M shows the apparent i-band magnitude distribution of 
all of the objects with 17.8 < i < 21.5 mag in the expected BOSS spectroscopic footprint 
( lEisenstein et al.l 1201 ll ) with XDQSOz probability larger than 0.5 over the specified redshift 
range. These apparent-magnitude distributions are smooth and well-behaved for low and 
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medium redshifts. The y also agree at the bri ght end with number counts derived from spec- 
troscopic observations (IRichards et al.ll2006l ). At the faint end {i > 21) the z-band number 
counts start to dechne due to increasing photometric uncertainties and incompleteness of 
the SDSS imaging near the faint hmit of SDSS. 

The BOSS aims to detect the baryon acoustic feature (BAF) in the Lya forest of 
background redshift z > 2.2 quasars. Not all q uasar s con tribute equally to this me asure- 
ment and, as shown by lMcDonald fc EisensteinI (120071 ) and lMcQuinn fc White! (1201 ll ). both 
brighter quasars and quasars near redshift z ^ 2.5 are the most valuable. Combining Lya 
BAF weights with the quasar probabilities as a function of redshift produced by XDQSOz, 
we can calculate the expected value of a quasar for the Lya BAF measurement. Defining 
a value function w{g,z), where g is the dereddened ^f-band magnitude of the object, the 
expected value of an object is 



(quasar value) 



dzw{g,z)p{z,quasa.Y\iivLx) . 



(16) 



By targeting objects with the highest expected value for a particular Lya BAF survey — which 
is dependent on the exact observational characteristics of that survey — we could optimize 
the targeting of quasars for that BAF measurement. 

The top panel of Figure [TO] shows the number of medium-redshift quasars found by apply- 
i ng th is value-based targeting for BOSS using the value function of iMcDonald fc Eisenstein 
(120071 ). Value-based targeting finds about 1 quasar deg~^ less than targeting based on the 
ranked medium-quasar probability list. The bottom panel shows that value-based targeting 
finds as much value as the straight probability-based targeting — but not more — such that the 
BAF measurement based on both samples should be equally precise. Straight probability- 
based targeting thus finds the same value while assembling a larger overall quasar sample. 
In addition, value-based targeting optimizes one experiment in a specific survey, whereas 
straight probability-based targeting returns information that is broadly applicable to a range 
of experiments and a range of surveys. Thus, in general, there is little to be gained from 
pursuing value-based targeting for BOSS. 

To investigate whether the XDQSOz quasar selection technique is limited by contami- 
nation from galaxies that appear point-like at the faint flux levels to which we push quasar 
classification, we look at the fraction of objects that appear point-like in a single SDSS 
imaging-pass but are extended in co- added data on SDSS imaging stripe 82. We match 
the point sources in a typ ical SDSS imaging run to the co-added galaxy catalog on stripe 
82 (lAbazajian et al.l 120091 ) and assess the fraction of point sources that are extended in the 
co-added data as a function of the z-band magnitude. This is shown in the top panel of 
Figure [TTl We see that the fraction of point sources that are extended in the co-added 
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imaging is only a few percent at relatively bright magnitudes, but almost reaches 50 percent 
at 2 = 22 mag. To assess whether these point-like galaxies are a significant contaminant for 
the XDQSOz quasar selection, we calculate their quasar probabilities (over all redshifts). 
The fraction of point-like sources that are extended in the co-added imaging and that have 
quasar probabilities larger than 0.5 is shown as a function of the i-band magnitude in the 
lower panel of Figure [TTl For comparison, the fraction of all point sources with quasar prob- 
ability larger than 0.5 is shown as the dashed curve. Point-like galaxies make up only a small 
(< 10 percent) fraction of XDQSOz-selected photometric quasars. However, because galaxies 
unlike stars cluster similarly to quasars, even this small contamination fraction might sig- 
nificantly degrade precision quasar-clustering measurements without improved star-galaxy 
separation or proper modeling. Given the rising fraction of point-like galaxies with increas- 
ing magnitude in the top panel of Figure [TT], point-like galaxies are likely to be the major 
contaminant for quasar selection at z > 23 mag. 



6. Photometric redshifts with XDQSOz 

We can use the XDQSOz fiux-redshift density model to derive full posterior probability 
distributions for the redshift of photometric quasars taking the photometric uncertainties of 
the object fully into account. Because the main advantages of the XDQSOz technique for 
photometric redshift estimation are that it a) returns full PDFs and b) allows auxiliary data 
such as that furnished by UV and NIR surveys to be included, we focus on those points here. 
ugriz-onlj photometric quasar redshifts suffer from various degeneracies that are inherent to 
the ugriz filter system. While including appropriate apparent-magnitude dependent redshift 
priors — as we do here — can partially relieve these degeneracies somewhat, no photometric 
redshift technique can entirely remove these degeneracies and XDQSOz is no exception (as 
we will see below). Crucially, even low signal-to- noise ratio UV and NIR data can cleanly 
resolve these degeneracies. 

For each object the posterior probability distribution for its redshift, based on its 
measured broadband fiuxes, is calculated by finding the apparent-magnitude bin that best 
matches the object's dereddened i-band magnitude. The posterior probability distribution 
is given by the mixture of 60 one-dimensional Gaussian distributions, with means, variances, 
and amplitudes given in equation (ITTi) . (IT^ . and (IT^ . respectively. This posterior proba- 
bility distribution can be calculated based on ugriz fiuxes, or with additional UV or NIR 
information if available. 

We show four examples of such redshift PDFs in Figure [12] for objects from the SDSS 
DR7 quasar catalog that have measurements in all of the UV and NIR filters. These objects 
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are from the sample used to train the flux-redshift quasar model, but, as we discuss in more 
detail below, we nevertheless believe that they provide an adequate representation of the 
performance of the XDQSOz technique. These objects are chosen to demonstrate the power 
and weaknesses of the ugriz, UV, and NIR data for photometric redshift estimation, and 
are therefore not a random subset of the data. We discuss the overall performance below. 

The top left panel shows an example where the ugriz fluxes suffice to accurately and 
precisely measure the redshift, and how the (relatively high signal-to-noise ratio) UV and 
NIR measurements tighten the PDF significantly. The top right panel shows an example 
where the extremely low UV flux basically vetoes the low-redshift peak that is present in the 
ugriz-onlj redshift PDF. If one were to use a simple non-detection GALEX catalog at 5a 
this result would not have been clear, because this object could have had the mean z ~ 0.8 
UV flux and still not be detected by GALEX. The ability of the XDQSOz technique to use 
and interpret low signal-to-noise ratio data is therefore crucial in this example. 

A weakness of the auxiliary UV data is apparent in the lower left panel. Here we see a 
z = 1.6 quasar that is much brighter than the average quasar at this redshift in the UV, such 
that the addition of the UV data mistakenly chooses the low-redshift peak of the degenerate 
ugriz redshift PDF. However, the NIR data are able to overcome this error and the addition 
of all the data confidently assigns this object a close-to-correct redshift. The lower right 
panel of Figure [12] shows another amusing example. 

In addition to testing the XDQSOz technique using the SDSS DR7 quasar sample, 
we have also drawn a sample of quasa rs located in the S DSS imaging in stripe 82 dis- 



covered as part of the 2SLAQ s urvey (ICroom et al.l l2009l ) and BOSS ( IRoss et al.l 12011 



Palanque-Delabrouille et al.l l201ll ). These quasars are generally fainter than the SDSS 
quasars and therefore they represent a stringent, independent test of the XDQSOz tech- 
nique's ability to return accurate redshift PDFs at faint magnitudes. We have specifically 
selected all 0.3 < z < 5.5 quasars with dereddened i > 19.1 mag from the 2SLAQ sample 
and all quasars on the SDSS imaging stripe 82 newly discovered by BOSS and use their 
single-pass SDSS photometry. Because most of these objects lie in the SDSS equatorial 
stripe, many of them have measurements from GALEX and UKIDSS LAS. 

Figure [13] shows posterior probability distributions for the redshift of two objects from 
the 2SLAQ catalog and two from the BOSS sample. The trends that were apparent for 
the SDSS DR7 quasars in Figure [12] are also evident for these fainter objects. The UV and 
NIR fluxes for these objects have, in general, been measured much less precisely than those 
discussed above, but the auxiliary data still provide valuable extra information about the 
redshift. 
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While most of the examples in Figures [12] and [13] have a considerable posterior proba- 
bility mass associated with the correct redshift, even when multiple peaks are present in the 
redshift PDF, this situation is generic — in that inspections of many redshift PDFs show that 
it is rare to have no posterior probability mass associated with the spectroscopic redshift. 

As a simple statistic for the degeneracy in the redshift PDF we examine the number 
of distinct peaks as a function of redshift. A single peak in the PDF is defined here as the 
widest contiguous region where the PDF is above the uniform distribution between redshift 
0.3 and 5.5 (i.e., flat in redshift). The top panel of Figure [TJ] shows the average number of 
such peaks as a function of redshift. This statistic clearly shows the main degeneracies of 
ugriz-hased photometric quasar redshifts. Basically the entire z < 1 region, a region around 
z = 1.5, and the 2.0 < z < 2.7 redshift range are degenerate . Hig her redshift quasars are 



readily identified as such using the ugriz colors (e.g.. [Fan et al.lll999[ ). From the lower panels 
we see that the addition of UV and NIR data softens all of these degeneracies. Essentially no 
degeneracies remain using the combination of all the UV, optical, and NIR data (lower panel 
of Figure [T4|) . Additionally, requiring that distinct peaks in the photometric-redshift PDF 
need to have a minimum integrated probability (e.g., defining a peak as a contiguous region 
above the uniform distribution with > 0.05 integrated redshift probability) gives similar 
results for the number of peaks and the improvement when adding UV and NIR data. 

Figure [15] shows the traditional spectroscopic-redshift vs. photometric-redshift diagram 
for quasars in the SDSS DR7 quasar sample, for various combinations of wavelength regimes. 
The right panels restrict the sample to those objects for which the redshift PDF has only a 
single peak and as such can be accurately described by a single photometric redshift (plus 
uncertainty). In the top left panel all of the w^ri^;- related degeneracies are clearly present 
and the right panel shows that by restricting the sample to single-peaked PDFs most of these 
degeneracies vanish, albeit at the cost of entire redshift ranges — most notably redshift-range 
2-0 ^ z < 2.5 quasars. The addition of UV and especially that of NIR observations a) greatly 
reduces the degeneracies as witnessed by the diminishing structure in the spectroscopic vs. 
photometric redshift plane and the increasing fraction of objects with a single peaked redshift 
PDF, and b) significantly reduces the scatter. In the individual panels we report the number 
of 4(7 outliers rather than the number of | A2;| > 0.3 objects; the latter number is somewhat 
meaningless without comparing it to the scatter, but to guide the eye we have included 
the \Az\ = 0.3 lines. The scatter is calculated without outlier-rejection. We note that — 
here and in the test below — the distribution of the i-band magnitude is unchanged when 
restricting the sample to objects with measured GALEX or UKIDSS fluxes; restricting to 
objects with NIR fluxes actually creates a fainter sample, because many faint quasars in the 
SDSS imaging stripe 82 have been observed by UKIDSS LAS, while many brighter quasars 
are located outside of the UKIDSS LAS footprint. 
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With the addition of UV and NIR data most objects have accurate and precise single- 
peaked photometric redshifts over the entire 0.3 < z < 5.5 redshift range: 97 percent of all 
objects with UV and NIR data and 99 percent of the subset with single-peaked redshift PDFS 
have photometric redshifts within |Az| < 0.3; for \^z\ < 0.1 these numbers are 84 percent 
and 86 percent respectively. This is a significant improvement over uqriz-o nly photometric 



redshi fts, where we find 86 percent of objects within |A2;| < 0.3. Similarly, IWeinstein et al. 



(I2OO4J ) found 83 percent of objects in this range. 



Photometric and spectroscopic redshifts in Figure US] are compared for objects in the 
sample used to train the XDQSOz technique. As such, one might object that this is not 
a fair representation of the performance of the XDQSOz technique. But, the relationship 
between the photometrically estimated redshift PDF and the spectroscopic redshift of a 
training object for XDQSOz is through the many-parameter fiux-redshift density model. 
This model includes reweighting objects in the training set according to a redshift prior. 
There is therefore no direct connection between output photometric redshifts and input 
spectroscopic redshifts — as t here is, for example, i n nearest-neighbor approaches to pho- 



tometric redshift estimation ( iBall et al.l 120071 120081 ). The fact that Figure [T5] contains all 



of the expected redshift degeneracies for ugriz-hased photometric redshifts is further proof 
of this independence: if there were a dependent connection we would not suffer from these 
degeneracies. 

To further test this issue we have divided our sample into a 90 percent training sample 
and a 10 percent test sample, as described above in Section 14.11 We redo the spectroscopic- 
redshift vs. photometric-redshift comparison for the 10 percent sample using the model 
trained in the 90 percent of remaining data — the results are in Figure [161 Because the 
10 percent sample is much smaller than the full SDSS DR7 quasar sample the statistics are 
noisier, but the trends are the same as in Figure [13 

To test the XDQSOz technique at fainter magnitudes, in Figure [T7| we compare spec- 
troscopic redshifts to photometric redshifts for i > 20.1 objects in the SDSS DR7 quasar 
catalog and for objects in the combined 2SLAQ and BOSS sample. The trends in this figure 
are the same as those for the brighter SDSS quasar sample and the scatter is somewhat 
larger; however, the photometric redshifts remain clustered around the spectroscopic red- 
shifts with no discernible bias. Even for the faint 2SLAQ and BOSS sample, the addition 
of low signal-to-noise ratio UV and NIR data leads to a significant increase in accuracy and 
precision. 

Using the technique described in this section we have computed photometric redshifts 
for all point sources in the expected BOSS spectroscopic footprint with XDQSOz quasar 
probabilities larger than 0.5 and 17.8 < ^ < 21.5 mag. The distribution of peaks of the 
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photometric-redshift distribution for these objects is shown in Figure [18] in a few apparent- 
magnitude bins (those bins from Figure [1] that he within the 17.8 < i < 21.5 apparent- 
magnitude range. The overall shape of t he redshift distribution in each i-band b in is similar 



to the redshift prior calculated from the iHopkins. Richards. &: HernquistI (120071 ) luminosity- 



function model. However, the redshift-dependent efficiency of photometric quasar classi- 
fication and redshift estimation is apparent in this comparison and the low classification 
efficiency at 2.5 < z < 3.5 depresses the distribution in that range while increasing the 
significance of the 2; ^ 1.5 peak. 



All of the results in this section have assumed the iHopkins. Richards. &: Hernquist 



( 120071 ) redshift prior, sho wn in Figure [H Using the difference between the Hopkins, Richards 



& Hernquist (2007) and [Richards et al.l (|2006[ ) redshift prior, given in the bottom panel of 



Figure [H we can assess the difference in photometric redshift distribution when using these 
two alternatives to the quasar luminosity function. The bottom panel of Figure [1] shows 
that the only significant difference between these two models is at relatively high redshift 
{z > 2.5) and near the SDSS detection limit {i > 21 mag). Strongly single-peaked photo- 
metric redshift distribution functions, such as many of those shown in Figures [T^ and [T^ are 
not affected by even order-of-magnitude changes in the redshift prior, especially when UV 
or NIR data are available. It is clear from Figures [T5] and [T71 that, on average, the influence 
of a different redshift prior will be limited, as the main differences lie at higher redshift, 
where the SDSS colors provide relatively unambiguous photometric redshifts (as shown by 
the lack of degeneracies at higher redshift in the photometric versus spectroscopic redshift 
plane). As the photometric redshift distributions are only marginally affected by the use of 
a different prior, classiflcation based on integration over these redshift PDFs also does not 
depend strongly on the details of the redshift prior. 



Discussion 



7.1. Comparison with other methods 



We have previously discussed other photometric redshift est imation techniques fo r 
quasars in Sect i on C omparing F i gure IT^ to s imilar diagrams in iBudavari et al.l ( 120011 ): 
Richards et aP (l2001bh : ball et al.l tooi I2OO8I ) we see, at least qualitatively, that the 
XDQSOz technique performs similarly when applied to the ugriz fluxes of bright, high 
signal-to-noise ratio objects. We did not expect to perform better as the near-degeneracies 
in the ugriz color-redshift plane are real and the quasar locus is broad. The advantage of 
the XDQSOz technique over these other techniques is that it can be applied to faint objects 
and that it can incorporate UV and NIR observations, even at low signal-to-noise ratio, to 
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improve photometric redshift estimation and quasar classification. 



No other method exists to calculate photometric quasar probabilities over arbitrary 
redshift ranges. By comparing with state-of-the-art photometric quas ar classification using 
kernel- dens i ty est imation or Gaussian mixture density deconvolution (IRichards et al.ll2004j : 



Bovy et al.l 1201 ll ). we have shown that the photometric quasar probabilities obtained by 



integrating the photometric redshift PDF over redshift are as good as those trained on the 
redshift range in question. 



7.2. Including additional information 

Two additional sources of information relevant to photometric quasar classification and 
redshift estimation stand out as the next steps toward a full quasar model, although neither 
of these is currently available over the large areas of the sky surveyed by projects such as 
the SDSS: photometric variability and differential-chromatic-refraction-induced astrometric 
offsets for quasars. Of these, photometric variability is the easiest to include in the quasar 
classification technique discussed here, as we can ignore the red shift information conta ined 



in the variability, because this information seems to be limited (iMacLeod et al.ll2011al ). As 
such, a photometric variability likelihood for quasars and stars could be multiplied with 
the fiux-redshift likelihood employed and modeled here to perform simultaneous color and 
variability selection. The combination of photometric variability and color information will 
lead to accurate photometric quasar classification and redshift estimation in the LSST era. 

The strong spectral features of quasars induce positio nal offsets to standard differential- 



chromatic- refraction corrections (IKaczmarczik et al.l 120091 ). These positional offsets are red 



shift dependent much as quasar colors are redshift dependent because of spectral features 
moving through individual filters (see, e.g.. Figure H]). Thus, these offsets could be used 
to break redshift degeneracies. Accomplishing this in the XDQSOz flux-redshift density 
context necessitates adding the astrometric offsets into the density model. Because the as- 
trometric offsets are zenith angle dependent, these models would have to be constructed for 
a range of airmasses, or airmass could be added as an additional dimension. As astrometric 
redshifts are a subtle and difficult-to- measure effect, the deconvolution aspect of the XD 
density-estimation technique could be useful. 
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7.3. Generalized photometric object classification and characterization 



The technique described in this article is a step toward a generalized method for object 
classification and characterization from broadband photometric data, which will become 
increasingly relevant in this era of major wide-field imaging surveys. While our quasar 
model includes redshift in addition to the broadband fiuxes of an object, our star model 
does not because stars do not possess a cosmo logical redshift. Stars are characterized by 
other properties— e.g., distance and met allicity — that are often estimated photometrically 



[e.g., lJuric et al.ll2008l : llvezic et al.l 120081 ). As we are interested here in quasar classification 
and characterization, our model implicitly marginalized over stellar properties. However, as 
part of a general object classification pipeline, these properties should be included — and the 
technique developed in this paper could be applied. 

More importantly, the general framework outlined in Section [2] and the specific im- 
plementation in Sections H] and show that we can perform classification when different 
models are characterized by different parameters — even different numbers of parameters. 
This aspect is especially relevant in the context of quasar selection bas ed on variability . 



Quasar variability is co mmonly modeled as a stochastic Gaussian Process (IKelly et al.ll2009 



Kozlowski et al.ll2010al ) characterized by a small number of parameters. Recently it has been 
shown that this framework allows for a clean selection of quasars because most stars — the 
main contaminants for quasar targeting currently — in general do not vary over long time 
baselines. In this type of selection, however, quasars and stars are o ften modeled (or fit) 



using the same stocha stic model, w hich is inappropriate f or the stars (jSchmidt et al.l 12010 



MacLeod et al.ll2011bl : however, see lButler fc Bloomll2010l ) 



The use of a stochastic model for variability-based star-quasar separation is particu- 
larly problematic for RR Lyrae stars — a common contaminant in color-based classification 
of quasars in some redshift ranges. RR Lyraes are known to vary periodically rather than 
stochastically. In the framework we use in this article all classes of objects can be de- 
scribed using models appropriate for the class — e.g., stochastically varying objects with a 
cosmological redshift for the quasars and non-variable sources for most stars — because object 
classification only uses marginalized probabilities, that is, probabilities marginalized over the 
internal properties of each class (cf. equation [5]). Describing each class with a model appro- 
priate for that class should lead to better classification and simultaneous object classification 
of sources into all classes. 

As photometric quasar classification moves to ever fainter fiux le vels, contarn i natiq n 
from point-like galaxies becomes increasingly important. As discussed by lBovy et al.l (120111 ). 
unresolved galaxies are implicitly taken into account in our model because our training set 
of "stars" is actually a set of non-variable point-like objects that therefore includes faint 
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galaxies. This model of galaxies again implicitly marginalizes over galaxy properties, most 
notably the redshift of the galaxy. Photometric redshift estimation for galaxies is closely 
related to obtaining photometric redshifts for quasars, but the galaxy photometric redshift 
techniques tend to rely more on t emplates in th eir model building, while quasars are modeled 
in a more empirical manner (e.g.. lBenitezll2000[ ) . However, this distinction is not fundamental 
and the general framework discussed here still applies. Template-based models are just 
another way of obtaining the probability p (fluxes | galaxy) or p(fiuxes, z|galaxy). 



7.4. Quasar tracks 

The XDQSOz model of Section H] also contains the distribution of broadband fluxes as a 
function of redshift p{fiuxes\z, quasar) such as is used to compute mean quasar color tracks. 
This probability density is obtained from the full XDQSOz model flux-redshift density by 
conditioning on redshift. For the relative flux this leads to a mixture of Gaussians with 
means, variances, and amplitudes given by expressions similar to those in equations (11 II) 
and f|T3l) — essentially, relative flux and redshift need to be interchanged in those equations. 
Properties of this distribution can be calculated from the mixture of Gaussians. For example, 
the mean quasar relative flux (or color) as a function of redshift is obtained by weighting 
the means of the Gaussian components using the redshift-dependent amplitudes. Code to 
calculate the mean quasar color track is included in the package described in the Appendix. 



8. Conclusion 

In this article we have introduced a new approach to photometric quasar classification 
that can simultaneously classify quasars and characterize their redshifts based on broad- 
band photometry. This technique, XDQSOz, is an extension of the XDQSO technique of 



Bovy et al.l (120111 ) that adds the unknown redshift as an extra parameter to the quasar 
model to obtain the likelihood p{z, fluxes | quasar) that is central to both quasar classification 
and photometric redshift estimation. We have shown that this combined approach is both 
the best current quasar classification technique — it has similar performance as the XDQSO 
method — and a competitive photometric redshift method. Compared to other approaches 
to photometric redshift estimation for quasars it has the advantage that it can incorporate 
additional UV and NIR data, even at low signal-to-noise ratio, and can be extended to 
fainter flux levels where photometric uncertainties are significant. Using samples of quasars 
drawn from the SDSS, 2SLAQ, and BOSS spectroscopic catalogs we have demonstrated this 
increased performance down to (7 ~ 22 mag. The addition of UV and NIR data to the 
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photometric redshift estimation problem essentially breaks all of the redshift degeneracies 
inherent to the ugriz filter set. 

Code to use the XDQSOz technique for classification and redshift estimation, including 
the ability to calculate full posterior probability distributions for the redshift, are made 
publicly available. This code is briefly described in the Appendix. 
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A. Code 

The XDQSOz code for target selection, classification, and photometric redshift estima- 
tion is publicly available at 



http : / /www . sdss3 . org/svn/repo/xdqso/tags/ . 



The code can be downloaded by svn export of the most recent tag. The documentation of 
the most recent version of the code can be found at 



http : //www . sdss3 . org/svn/repo/xdqso/tags/v0_6/ doc/build/html/index . html 



Future updates will have documentation available at a similar URL. 

The XDQSO / XDQSOz package contains routines for quasar classification using the 
XDQSO and XDQSOz techniques. It also contains code to calculate posterior probability 
distributions for the quasar redshift of objects based on input psff lux and psf f lux_ivar; 
these can be found in standard SDSS data files such as the 'sweeps' file^ 

The XDQSOz models for the quasar color-redshift density are contained in the data/ 
directory. They are in the form of FITS files containing the XD models for all of the 
bins in apparent magnitude, with one file for each combination of SDSS with GALEX and 
UKIDSS. Each FITS file contains 47 extensions, where extension k contains a structure with 
the amplitudes (tag xamp), means (tag xmean), and covariance matrices (tag xcovar) for 



See http: //data. sdss3 . org/datamodel/f iles/PHOTO_SWEEP/RERUN/calibObj .html 
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bin k in i-band magnitude. The zeroth dimension of the Gaussian represents the natural 
logarithm of the redshift, followed by SDSS, GALEX, and UKIDSS fluxes (where relevant) 
in this order, and ordered as NUV/FUV for the GALEX, and YJHK for the UKIDSS. 
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Fig. 1. — Prior distribution for the redshift in a few i-band bins {top panel). The histogram 
shows the redshift distribution of 69,994 quasars from the SDSS DR7 quasar catalog with 
dereddened i-band magnitude < 19.1, where the quasar catalog is highly complete (except 
for the re dshift range 2.5 < z < 3.2). The bottom panel shows the difference in prior when 
using the Richards et al. hereafter R06 ) quasar luminosity function rather than the 



fiducial iHopkins. Richards. &: HernquistI (120071 : hereafter HRH07) model. 
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Fig. 2. — Cumulative distribution of signal-to-noise ratio for those quasars in the SDSS DR7 
quasar sample observed by GALEX 62,628 objects; top panel) and UKIDSS LAS (~ 
25,510 objects; bottom panel). See Sections 13.21 and 1 3.31 for the number of objects in each 
individual bandpass. The five-sigma detection limit is indicated. 
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Fig. 3. — Total probability of the spectroscopic rcdshifts of objects in the 10 percent test 
sample given their ugriz fluxes using models with different numbers of Gaussians trained on 
the remaining 90 percent of objects in the SDSS DR7 quasar catalog. 



-37- 




resam^jled quasar data 




2 3 4 5 
redshift 



12 3 4 
rcdKliift 



2 3 4 
redshift 



2 3 4 
redshift 




3,0 
2,5 
2.0 
1.5 
1.0 
0.5 
0.0 






resamjjied cjuasar data 



2 3 4 
redshift 



2 3 4 
redshift 





2.5 










1 2.5 


: 2.0 










2.0 


: 1.5 

T 1.0 








1 




1.5 


0.5 










0.5 


0.0 










' 0.0 


i -OS 








: -0.5 



2 3 4 
redshift 




2 3 4 
redshift 




2 3 4 
redshift 




2 3 4 
redshift 



2 3 4 
redshift 



2 3 4 
redshift 



12 3 4 
redshift 



Fig. 4. — Flux-redshift and color-redshift diagrams for the 18.6 < i < 18.8 bin in apparent 
magnitude for the 103,577 objects in the quasar catalog. The first column shows a condi- 
tional plot of a sampling from the extreme deconvolution fit with the errors from the quasar 
data added; the second column presents the quasar data resampled according to the q uasar 
luminosity function as described in Section [3] and in more detail in iBovy et al.l ( 1201 ll ). All 
fluxes are relative to the i-band flux of the object. The third and fourth columns show the 
same information as the first and second columns, but for colors. Linear conditional densi- 
ties are shown as well as the 25, 50, and 75 quantile-lines. The vertical lines denote where 
prominent emission lines pass in and out of the relevant filters (Lya: full; CIV: dotted; CIII: 
dashed; Mgll: dash-dotted; Ha: dash-dot-dot-dotted). Although only the conditional rela- 
tion between redshift and fiux/color is shown here, we fit the full density in the fiux-redshift 
space. 
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Fig. 5. — Flux-redshift diagrams for the 18.6 < i < 18.8 bin in apparent magnitude for 
the 62,628 SDSS quasars with GALEX observations in both GALEX bandpasses. The left 
column shows a conditional plot of a sampling from the extreme deconvolution fit with the 
errors from the quasar data added; the right column displays the quasar data resampled 
according to the quasar luminosity function as described in Section |3l All fluxes are relative 
to the i-band flux of the object. Densities, curves, and vertical lines are as in Figure HI The 
thick light-gray bands show where the Lyman limit (A912 A) crosses the UV filters. 
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Fig. 6. — Same as Figure El but for the 25,510 SDSS quasars that have UKIDSS LAS 
observations in all four UKIDSS bandpasses. 
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Fig. 7. — Comparison between mid-redshift (2.2 < z < 4.0) quasar probabilities computed 
using XDQSO, that is, based on flux-density models in broad redshift ranges, and XDQSOz, 
i.e., obtained by integrating flux-redshift-density models over the relevant redshift range, 
for 490,793 objects in SDSS stripe 82 based on single-imaging-run flux measurements. Con- 
ditional 25, 50, and 75 percent quantiles are shown. 
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Fig. 8.— Mid-redshift (2.2 <z< 4.0) quasar selection efficiency for XDQSO and XDQSOz 
as a function of target density for objects in SDSS Stripe 82 based on single-imaging-run 
flux measurements. The top panel bases selection solely on SDSS ugriz fluxes, the lower 
panels add GALEX NUV and FUV medium-deep measurements as well as UKIDSS YJHK 
photometry, both of which are available for almost all Stripe-82 sources, through force- 
photometering GALEX and UKIDSS LAS imaging data at SDSS positions. The 50 percent 
selection efficiency is indicated and the ugriz-onbf curve for XDQSOz is repeated in gray in 
each panel. 
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Fig. 9. — Apparent z-band magnitude distribution of all point sources in the expected SDSS- 
III BOSS footprint with XDQSOz quasar probability larger than 0.5 over the indicated 
redshift range and dereddened i between 17.8 mag and 21.5 mag. D iamonds indicate number 
counts from the SDSS spectroscopic survey (IRichards et al.ll2006l ). 
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Fig. 10. — Comparison between "value- based" and straight probability-based quasar selec- 
tion for BOSS. "Value-based" selection ranks targets on the expected signal-to-noise ratio 
of the Lyman-a forest, while probabihty-based selection ranks on P(2.2 < 2; < 4.0 quasar). 
The top panel shows the number of mid-redshift quasars found by each method as a function 
of the target density; the bottom panel shows the value of the selected quasars. Note that 
some z < 2.2 quasars are valuable for the Lyman-a forest BAF measurement. 
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Fig. 11. — Point-like galaxy contamination of color-based quasar selection: The top panel 
shows the fraction of point sources in a sing le imaging-pass of SP SS stripe 82 that are 
extended in the co-added stripe-82 imaging ( lAbazajian et al.l l2009l ) as a function of the 
i-band magnitude. The bottom panel shows the fraction of such point-like galaxies that 
have an XDQSOz quasar probability (over all redshifts) larger than 0.5. The dashed curve 
in the bottom panel shows the fraction of all point-sources that have an XDQSOz quasar 
probability larger than 0.5. Even though point-like galaxies start to dominate the number 
counts around i = 22 mag, they only make up a small fraction of photometrically selected 
quasars at all magnitudes. 
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Fig. 12. — Posterior distribution functions for the photometric redshift of four quasars from 
the SDSS DR7 quasar catalog. The top panel in each plot shows the redshift posterior 
distribution function based only on ugriz fluxes; the lower panels add UV (NUV and FUV) 
and NIR measurements (in YJHK). The vertical line shows the spectroscopic redshift. The 
horizontal line represents the uniform distribution over 0.3 < z < 5.5. 
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Fig. 13. — Same as Figure [12] but for objects from the fainter test sample, composed of 
quasars from the 2SLAQ survey and from the BOSS in SDSS stripe 82. 
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Fig. 14. — Average number of peaks in the posterior distribution function for the photo- 
metric redshift as a function of spectroscopic redshift for the SDSS DR7 quasar sample. A 
peak is defined as a contiguous region where the posterior distribution exceeds the uniform 
distribution on 0.3 < z < 5.5. The top panel uses photometric redshift predictions from only 
ugriz data; the lower panels add UV and NIR data. The optical-only curve is repeated in 
the lower panels in gray. 
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Fig. 15. — Spectroscopic versus photometric redshift for quasars from the SDSS DR7 quasar 
catalog. Photometric redshifts are maximum a posteriori redshifts, i.e., they are at the 
peak of the photometric redshift posterior distribution function. The left column shows all 
sources; the right column shows sources that have only a single peak in their photometric 
redshift posterior distribution, that is, they have only one contiguous region in their posterior 
distribution where the distribution exceeds the uniform distribution on 0.3 < z < 5.5. 
The top row shows predictions based only on ugriz fluxes, the lower panels add UV and 
NIR information, restricted to those objects that were observed in both NUV and FUV for 
GALEX, and in all four YJHK UKIDSS filters. The one-to-one line is shown in black and 
the |A^| = 0.3 lines are shown in gray. 
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Fig. 16. — Same as Figure [T5| but for a random sample of 10 percent of objects from the 
SDSS DR7 quasar catalog, with the model trained on the remaining 90 percent of quasars. 
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Fig. 17. — Spectroscopic versus photometric redshifts for the i > 20.1 subset of the SDSS 
DR7 quasar catalog as well as for the faint test sample composed of quasars from the 2SLAQ 
survey and the BOSS. The two columns on the left are as for Figure [151 but restricted to 
those objects with i > 20.1 mag. The two columns on the right are as for the leftmost 
columns, but for the 2SLAQ + BOSS test sample. 
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Fig. 18. — Distribution of the peak of the photometric redshift distribution for all objects in 
the expected SDSS-III BOSS survey i n a few apparent-magnitude ranges. Th e curves are 
the redshift priors calculated from the iHopkins. Richards. &: HernquistI (120071 1 luminosity- 
function model. The color coding is the same as in Figured! The overall shape of the redshift 
distribution is similar to the prior distribution, except for the drop in 2.5 < z < 3.5 due to 
the decreased efficiency of photometric quasar classification based on SDSS photometry. 



